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Abstract 

Background: Collagens require the hydroxylation of proline (Pro) residues in their triple-helical domain repeating 
sequence Xaa-Pro-Gly to function properly as a nnain structural component of the extracellular matrix in animals 
at physiologically relevant conditions. The regioselective proline hydroxylation is catalyzed by a specific prolyl 
4-hydroxylase (P4H) as a posttranslational processing step. 

Results: A recombinant human collagen type I a-1 (rClal) with high percentage of hydroxylated prolines (Hyp) 
was produced in transgenic maize seeds when co-expressed with both the a- and p- subunits of a recombinant 
human P4H (rP4H). Germ-specific expression of rClal using maize globulin-1 gene promoter resulted in an average 
yield of 12 mg/kg seed for the full-length rClal in seeds without co-expression of rP4H and 4 mg/kg seed for the 
rClal (rClal-OH) in seeds with co-expression of rP4H. High-resolution mass spectrometry (HRMS) analysis revealed 
that nearly half of the collagenous repeating triplets in rClal isolated from rP4H co-expressing maize line had the 
Pro residues changed to Hyp residues. The HRMS analysis determined the Hyp content of maize-derived rClal-OH 
as 18.11%, which is comparable to the Hyp level of yeast-derived rClal-OH (17.47%) and the native human Clal 
(14.59%), respectively. The increased Hyp percentage was correlated with a markedly enhanced thermal stability of 
maize-derived rClal-OH when compared to the non-hydroxylated rClal. 

Conclusions: This work shows that maize has potential to produce adequately modified exogenous proteins with 
mammalian-like post-translational modifications that may be require for their use as pharmaceutical and industrial 
products. 



Background 

Collagen is the most abundant protein found in animals. 
It has been used widely for industrial and medical appli- 
cations such as drug delivery and tissue engineering 
[1,2]. Human type I collagen is the most abundant col- 
lagen type in the human body and is also one of the 
most studied collagen types. It is a heterotrimer com- 
posed of two al (Clal) and one a2 (CIa2) chains with 
the helical region composed by a repeating composition 
of Xaa-Yaa-Gly, where X and Y are typically proline 
(Pro) and hydroxyproline (Hyp) [3]. Collagens used 
commercially are traditionally extracted from animal 
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tissues. These products contain different types of col- 
lagen and may be contaminated with potential immuno- 
genic and infective agents considered hazardous to 
human health. Thus, recombinant technology has been 
developed to produce high quality and animal derived 
contaminant-free collagens. Recombinant collagens have 
been produced in mammalian cells [4], insect cell cul- 
tures [5], yeast [6], and plant cell culture [2,7]. 

Transgenic plant systems have advantages over other 
recombinant production systems in terms of lower cost, 
higher capacity, lower infective agents/toxins contamina- 
tion risk, and inexpensive storage capability facilitating 
processing [8,9]. The production of plant derived 
recombinant collagen I a-1 (rCIal) was reported in 
2000 using tobacco [10] and tobacco cell culture [2]. 
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The rCIal was also expressed in transgenic maize seed 
[11,12] and barley [13]. 

A challenge for producing rCIal in non-mammalian 
expression systems such as transgenic plants is the 
resulting low regioselective hydroxyproline content that 
makes the product unstable at physiologically relevant 
temperatures. In humans the 4-hydroxyproline residues 
synthesized by prolyl 4-hydroxylases (P4Hs) as a post- 
translational modification increase the stability of the 
collagen triple helix structure [14]. The stability of the 
collagen is increased with the presence of the hydroxy- 
proline primarily through stereoelectronic effects [15]. 
On the other hand, the hydroxyproline content for the 
rCIal is almost zero in transgenic tobacco [10], or very 
low in transgenic maize [11] when rCIal is not co- 
expressed with P4H. Since the insect, microbial and 
plant endogenous P4Hs are not able to achieve the 
same level of hydroxylation of rCIal as present in the 
human Clal chain, the co-expression with collagen of a 
recombinant animal P4H (rP4H) is necessary to increase 
the hydroxyproline content of the rCIal to deliver a 
stable product. In tobacco, co-expression of P4H with 
an a subunit from C. elegans and a P subunit from 
mouse [16] or a recombinant human P4H [17] led to 
increased hydroxyproline levels of the rCIal. Similar 
results were seen in tobacco cell culture [2]. However, 
the tobacco-derived collagen still had lower Hyp content 
compared to native human Clal making this product 
unsuitable for use in many applications. 

In this study, we generated transgenic maize lines 
expressing the human rCIal gene alone or lines co- 
expressing both rCIal and rP4H genes. Using high- 
resolution mass spectrometry (HRMS) analysis, we mea- 
sured the percentages of Hyp and Pro residues in the 
rCIal protein extracted from transgenic maize seeds as 
well as the actual positions of hydoxylated prolines. We 
also performed in vitro pepsin treatment at different 
temperatures to compare the thermal stabilities of 
maize-derived hydroxylated or non-hydroxylated rCIal 
proteins. Here, we report for the first time that by 
co-expressing rP4H genes, maize can produce rCIal 



with a hydroxyproline content comparable to native 
human type I collagen. This achievement provides 
further confirmation that maize seeds can be used to 
produce exogenous proteins that require mammalian- 
like posttranslational modifications for use in specific 
applications. 

Results 

Generation of maize lines expressing rClal with and 
without rP4H co-expression in seeds 

The constructs used in this study are shown in Figure 1. 
The CGB construct carries a gene encoding a recombi- 
nant full-length human collagen type I, rCIal, and the 
CGD construct carries the rCIal gene and both a and 
P subunits of recombinant human prolyl 4-hydroxylase, 
rP4Ha and rP4Hp. The rCIal gene was partially maize 
codon-optimized and its expression was driven by a 
maize embryo specific globulin- 1 promoter (Pglb, [18]). 
A barley alpha amylase signal sequence (BAASS, [19]) 
was used as a substitute for the human Clal signal pep- 
tide (UniProtKB/Swiss-Prot: P02452 [1-22]). The combi- 
nation of embryo specific promoter and the BAASS has 
demonstrated high expression of foreign proteins in 
maize seed [20-22]. The rCIal gene lacks the N-propep- 
tide but contains the telopeptide sequences both at the 
N and C terminal regions. A 29 amino acid bacterioph- 
age T4 fibritin foldon peptide sequence [23] was fused 
at the C-terminus to the rCIal replacing the C-propep- 
tide. The foldon, as the native C-propeptide, facilitates 
the rCIal triple-helical assembly and enhances its stabi- 
lity [23]. To avoid undesired DNA rearrangement 
caused by using identical sequences (such as using same 
promoters for multiple gene expression in a single con- 
struct), we chose to use the maize ubiquitin promoter 
(Pubi, [24]) to drive the expression of a and P subunits 
of rP4H. It was shown previously that there is a prefer- 
ential accumulation of recombinant protein in germ tis- 
sue using the ubiquitin promoter [25]. 

Both constructs were introduced into maize Hi II 
germplasm using immature embryo via an Agrobacter- 
/wm-based transformation system. Twelve independent 
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Figure 1 A schematic representation of the two constructs used in this study. LB, left border 


of Agrobacterium T-DNA; T35S, CaMV 35S 


terminator; bar, bialaphos resistant coding sequence; P, CaMV 35S promoter; PIN II, potato protease inhibitor II gene terminator; Clal, human 


collagen 1 al chain coding sequence; BAASS, barley alpha amylase signal sequence; P4Ha, prolyl 4-hydroxylase a subunit; P4H(3, prolyl 4- 


hydroxylase (3 subunit; Pglb, maize globulin-1 promoter; Pubi, maize ubiquitin promoter; RB, right border of Agrobocterium T-DNA. 
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transgenic events for CGB and 21 events for CGD, 
respectively, were recovered and brought to maturity in 
the greenhouse using pollen donors from an elite 
inbred. Initial transgene expression screens were con- 
ducted on both callus and Ti seeds using an enzyme- 
linked immunosorbent assay (ELISA) to detect the 
expression of rCIal, and a and P subunits of rP4H. T2 
seeds from events with the highest transgene expres- 
sions were produced by self pollination. 

For Ti seed analysis, seeds from multiple plants 
derived from each event were analysed. In depth mole- 
cular and biochemical characterizations of rCIal 
described in this study were performed on T2 seeds 
from one selected CGB and CGD event, respectively. 
Individual seeds of the transgenic events were analyzed 
by polymerase chain reaction (PCR) to separate trans- 
gene positive seeds from negative ones. Positive seeds 
were pooled and analyzed by ELISA for the expression 
of the rCIal. Negative null segregant seeds were used as 
controls. 

The average expression level of rCIal measured by 
ELISA in event CGB was 1.86 ± 1.26 mg/g of total solu- 
ble protein (TSP) or 12.14 ± 8.06 mg/kg of dry seed 
weight (DSW). The highest rCIal content measured to 
date from a single CGB seed was 3.54 mg/g TSP or 
25.11 mg/kg DSW. The average expression level of the 
rCIal in event CGD was about four times lower than 
that of in event CGB, which was 0.58 ± 0.26 mg/g TSP 
or 4.40 ± 2.09 mg/kg DSW. The highest rCIal expres- 
sion in single CGD seed was 0.92 mg/g TSP or 7.54 
mg/kg DSW. 

Figure 2 shows the detection of rCIal in the total 
protein extracts from CGB and CGD seeds. Because of 
the low expression level of rCIal in CGD seeds, we 
concentrated the extract using an Amicon Ultra- 15 Cen- 
trifugal Filter Unit with Ultracel-30 membrane (cat # 
UFC903008, Millipore) before loading on the gel. Figure 
2 shows that rCIal could be detected from both CGB 
and CGD protein samples using anti-foldon antibody. 



CGB CGD M KDa M CGB+CGD FE285 FE291 M KDa 

Figure 2 Analysis of electrophoretic mobility difference of 
rClal in the CGB and CGD line. Equal volumes of total protein 
extracts from seeds of CGB, CGD (10 x concentrated by volume) 
and mixture of CGB+CGD extracts were loaded on the gel. The 
rClal from CGB and CGD lines were detected by western blot 
using anti-foldon antibody. Pichia-defwed rClal (FE291) and rClal- 
OH (FE285) were included as controls and detected by Coomassie 
Brilliant Blue staining. Open arrows, rClal from CGB or FE291; solid 
arrows, OH-rClal from CGD or FE285. M, molecular weight marker. 



No cross-reacting band at a similar position could be 
detected in non-transgenic maize seeds (data not 
shown). It was observed that the CGB rCIal migrated 
faster than its CGD counterpart (Figure 2, open arrow 
in Lane CGB vs solid arrow in Lane CGD), suggesting 
different electrophoretic mobility for these two proteins. 
To exclude that the observed protein migration differ- 
ence was due to lane shifts during electrophoresis, we 
mixed the TSPs from both CGB and CGD before load- 
ing on the gel. Lane CGB+CGD of Figure 2 shows there 
are two distinct major bands that cross-reacted with 
anti-foldon antibody. This result indicates that the 
rCIal proteins derived from maize CGB and CGD 
events have different electrophoretic mobility, with CGD 
rCIal moves slower than CGB rCIal. The altered elec- 
trophoretic mobility may reflect the increase in molecu- 
lar weight of rCIal that due partially to increased 
numbers of hydroxylated proline in rCIal from CGD 
event, which is also co-expressing the rP4H genes. The 
difference in electrophoretic mobility can also been seen 
in Pichia-denved rCIal (Figure 2, FE291) and hydroxy- 
lated rCIal (Figure 2, FE285). 

The expression of the P subunit of rP4H in the CGD 
seeds was verified by Western Blotting with an anti 
P4HP monoclonal antibody (Figure 3). A main band at 
-60 kD (open triangle. Figure 3) was detected in CGD, 
but not in CGB and non-transgenic wild type maize 
control, as expected. A weak secondary band detected in 
CGD is likely due to cross-reactivity of other forms of 
rP4Hp in maize. The detection of a subunit of P4H was 
performed in transgenic callus but not in seeds (data 
not shown). 

To determine whether rCIal can also be detected in 
tissues other than seeds, we performed both protein and 
transcript analyses of rCIal in CGB and CGD plants. 
Maize leaf samples from 5 different development stages 
were collected. Total RNA and proteins were prepared 
from these tissues and subjected to Reverse Transcrip- 
tase PCR (RT-PCR) and ELISA, respectively. No detect- 
able rCIal transcript and protein could be observed in 
these samples (data not shown), suggesting that the 
rCIal is not expressed in leaf tissue in both lines as 
expected. 
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Figure 3 Western blot analysis of the rP4Hp using anti-P4Hp 
antibody. Equal volume of total protein extraction from seeds of 
CGB, CGD and non-transgenic control maize {WT) extracts was 
loaded on the gel. Open triangle, rP4H(3. M: molecular weight 
marker. 
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Co-expression of rP4H increases the hydroxylation of 
rClal 

To examine the percentage and positions of the prolines 
that were hydroxylated by the co-expression of rP4H in 
the CGD event, we carried out proteomics analysis of 
gel purified rCIal using liquid chromatography tandem 
mass spectrometry (LC-MS/MS) on the Linear Ion Trap 
Orbitrap (LTQ Orbitrap) Mass Spectrometer, a high 
resolution mass spectrometry (HRMS). The HRMS not 
only can verify the amino acid sequence of the rCIal, 
but also can identify the positions of hydroxylated pro- 
Une residues (Hyp). In addition to maize-derived rCIal 
proteins from CGB and CGD, we also included three 
control samples: gel isolated Clal fragment from 
human collagen (cat # 234138, CalBiochem Inc), Pichia- 
derived rCIal (isolated from strain FE291 that does not 
co-express rP4H) and Pichia-deriyed hydroxylated 
rCIal (isolated from strain FE285 that co-express rP4H) 
(FibroGen). 

Results are summarized in Figure 4 and Table 1. The 
protein sequence coverage by the HRMS (yellow high- 
Ughted sequences in Figure 4) on the five samples ran- 
ged from 58.66% (human Clal) to 85.81% {Pichia Clal- 
OH). To compare the percentages and positions of Hyp 
in each sample, we chose the peptide regions in all five 
Clal proteins (475 AA) that were covered by the 
HRMS (red boxes in Figure 4). The common peptide 
regions represent 44.94% of the full-length Clal 
sequence (1057 AA). A total of 114 Pro and Hyp out of 
475 total amino acids (24.00%) were identified by the 
HRMS in all samples (Figure 4). 

For two maize-derived Clal samples, a total of 28 and 
86 Hyp were identified from CGB and CGD (green 
highlighted amino acids in Figure 4), respectively, repre- 
senting a Hyp percentage of 5.89% and 18.11%, respec- 
tively, for these two lines (Table 1). This result indicates 
that the co-expression of rP4H in maize can greatly 
enhance the hydroxylation of prolines on collagen mole- 
cules. The increased number of Hyp in rCIal from 
CGD samples may partially contributed to the increased 
molecular weight and thereby decreased the migration 
rate (Figure 2). 

Because rP4H catalyzes hydroxylation of Pro residues 
in the Yaa position of the Xaa-Yaa-Gly triplets within 
collagen strands [26], we further compared the Pro resi- 
dues on all Xaa-Pro-Gly triplets in both maize CGB and 
CGD lines. HRMS analysis identified 752 AA (71.14%) 
from maize-rCIal (CGB) and 818 AA (77.39%) from 
maize rCIal-OH (CGD) as shown in Table 1 and Figure 
4 (bold and yellow highlighted letters). Among these 
HRMS identified AA, we chose 652 AA that were 
shared for both CGB and CGD. We further identified a 
total of 90 sets of collagenous triplets within the 652 
AA. Among the 90 sets of triplets, 44 sets (48.9%) have 



the Pro residues changed to Hyp (double underlined tri- 
plets in Figure 4) in both CGB and CGD lines; 5 sets 
(5.6%) have the Pro unchanged (single underlined tri- 
plets in Figure 4) in both lines. On the other hand, 41 
sets of triplets (45.6%) have the Pro residues changed to 
Hyp (black boxes in Figure 4) only in rCIal isolated 
from CGD, indicating that nearly half of the collagenous 
triplets were posttranslationally modified by the co- 
expression of rP4H genes in CGD maize line. 

Seventy-one Hyp residues out of 475 AA by HRMS 
(14.95%) were identified in human Clal control sample 
(Table 1). For Pichia samples, while only two Hyp resi- 
dues (0.42%) were found in non-hydroxylated Clal 
(FE291), 83 Hyp residues (17.47%) were found in hydro- 
xylated Clal (FE285), indicating that the co-expression 
of P4H in Pichia had also dramatically increased proline 
hydroxylation in collagen (Table 1). 

Co-expression of rP4H enhances the thermal stability of 
rClal 

To further characterize the maize-derived rCIal and 
rCIal-OH, we carried out thermal stability analysis 
using pepsin digestion at 10°C for 15 minutes after heat 
treatment of protein samples at 4°C or temperatures 
ranged from 29 to 38.6°C for 6 minutes. The proteolytic 
resistance of maize-derived coUagens were compared 
with that of the native human collagen and the recombi- 
nant collagen from Pichia pastoris. Figure 5A is a Wes- 
tern Blot results showing the proteolytic resistance of 
the coUagens after 4°C heat treatment using anti-foldon 
antibody. Both non-hydroxylated rCIal from maize 
(CGB) and non-hydroxylated rCIal from Pichia (FE291) 
were not detected after pepsin treatment. By contrast, 
the hydroxylated rCIal could be detected from both 
maize (CGD) and hydroxylated-CIal Pichia (FE285) 
samples, suggesting that the pepsin digestion resistance 
of these coUagens was associated with the higher per- 
centage of Hyp residues. 

The thermal stability of rCIal was further character- 
ized by the determination of melting temperature (Tm) 
using Western analysis. Two different antibodies, anti- 
foldon and anti-25 kD collagen, were used. In our 
hands, anti-foldon antibody gave results with less non- 
specific cross-reactive background bands, whUe anti-25 
kD antibody appeared to be more sensitive. Because the 
native human Clal can only be detected with anti-25 
kD antibody, we used both antibodies in this study. In 
the experiments shown in Figure 5B, maize seeds TSP 
from CGB and CGD were extracted and concentrated 
as described above. The quantities of maize-derived 
rCIal were estimated by ELISA. Approximately 50-100 
ng/reaction of rCIal from CGB and CGD were used for 
pepsin treatment. As a control, commercial human col- 
lagen (2 (ig/reaction) was spiked into TSP extracted 
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VRSLTGPIGPPGPAGMGDKGESGPSGPAGPTGARSEPGbRGEPGbfGPAGFAGPPGADGpPG^ 

VR3LTGPIGPPGPAGAPGDKGESGPSGPAGPTGAR3?\PG3RGEPGPPGPAGFAGPPGADG2PG \KGEPG 3AGAKGDAGPPGPAGPAGPPGP IGNVG \PGA 
VRSLTGPIGPPGPAGAPGDKGESGPSGPAGPTGARs fe^GP RG EPGpPG PAGFAGPPGADG bPGl ^G lEPGb AGAKGDAGPPGPAGPAGPPGPIGNVG kPG^ 

VR 3L T GP I GP P GP AGAP GDKGE S GP S GP AGP T GAR 



;apgdrgepgppgpagfagppgadgqpgakgepgdagakgdagppgpagpagppgpignvgapga 



Pichia rCIal 
Maize rCIal 
Human Clal 
Maize rCIal-OH 

Pichia rCIal-OH 



KGARGSAG PPG. ^TGFPGAAGRVGPPGP SGNAG PPGPPG PAGK 2GGKGPRGETGPAG EIPG 



;SAGPPGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGK^GGKGPRGETGPAGRPGEVGPPGPPGPAGEK 



PGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGK 



KGARGSAG PPG ^TGFPGAAGRVGPPGPSGNAG PPGPPG PAGK 2GGKGPRGETGPAG RPG 



KGAR GS AGP P GAT GFP GAAGRVGP P GPS GNAGP P GP P GP AGK 



^GGKGPRGETGPAGRPG 



iVG PPGPPG PAGEK 3SPGADGPAG APGTPG PQGIAGQRGW 



^VGPPGPPGPAGEK 



iVG PPGPPG PAGEK 3SPGADGPAG APGTPG PQGIAGQRGW 

^ GGKGP RGE T GPAGRPGEVG.PPGPPGPAGEK bsPGADGPAGAEGTEGPQGIAGQRGVV 



iSPGADGPAGAPGTPGPQGIAGQRGVV 



;SPGADGPAGAPGTPGPQGIAGQRGVV 



Pichia rCIal 
Maize rCIal 
Human Clal 
Maize rCIal-OH 

Pichia rCIal-OH 



860 



880 



900 



GLPGQR GEI G TPG LPG PSG SPG SQGPSGASGERG PPG PMG PPG LAGPPGESGREG M>G kEG 3PG X DGSPGAKGDR SETGPAGPPGAPGAPG ^PG PVGPAG 



LPGQRGEFGFPGLPGPSGEPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPGAEGSPGR3GSPGAKGDR3ETGPAGPPGAPGAPGAPGPVGPAG 



'PG LPGPSG ^PG ■(QGPSGASGERGPPGPMG 



GLPGQR GEF G FPG|LPG| PSG FPGK QGPSGASGERG |PPG| PMG pPG| LAGPPGESGREG P^Gg ^G |SPGP DGSPGAKGDR SETGPAGPPGAPGAPGKPGPVGPAG 



LPGQR| GEI ^GFPGLPGPSGEPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPGAEGSPGRP GSPGAKGDR |GETGPAGPPGA PGAPGAP GPVGPAG 



PGLAGPPGESGREG^PG^EG3PG^3GSPGAKGDR3ETGPAGPP GAP GAP G \P G 



Pichia rCIal 
Maize rCIal 
Human Clal 
Maize rCIal-OH 

Pichia rCIal-OH 



920 

■ GDRGETGPAGPAGPVGPAGAR 



■GDRGETGPAGPTGPVGPVGAR 



PAGPQGPRGDKGETGEQGDRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPR3PPGSAGAPGKDGLNGLPGP 



KpGDRGETGPAGPAGPVGPAGARpPAGPQGPR GDKGETGEQGDRGIKGHRGFSGLQG PPG PPG SPG SQGPSGASGPAGPR 3 PPG SAG ?^G KDGLNGLPGP 

PAGPQGPRGDKGETGEQGDRGIKGHRGFSGLQG=PG?PG3PG^QGPSGASGPAGPR3PPGSAGz^PGKDGLNGLPGP 



K 3GDRGETGP AGP AGPVGP AGAR GPAGPQGPR GDKGETGEQGDRGIKGHRGFSGLQG PPG PPG SPG SQGPSGASGPAGPR 3 PPG SAG kPG SCDGLNGLPGP 



kegdrgetgpagpagpvgpagar IgpagpogprI gdkgetgeogdrgikghrgfsglogppgppgspgeogpsgasgpagpr Igppgsagapgkdglnglpgp 



1020 



1040 



Pichia rCIal 
Maize rCIal 
Human Clal 
Maize rCIal-OH 

Pichia rCIal-OH 



IGPPGPR3RTGDAGPVGPPGPPGPPGPPGPPSAGFDFSFLPQPPQEKAHDGGRYYRA 
IGPPGPR3RTGDAGPVGPPGPPGPPGPPGPPSAGFDFSFLPQPPQEKAHDGGRYYRA 

IGPPGPR3RTGDAGPVGPPGPPGPPGPPGPPSAGFDFSFLPQPPQEKAHDGGRYYRA 
IGPPGPR3RTGDAGPVGPPGPPGPPGPPGPPSAGFDFSFLPQPPQEKAHDGGRYYRA 

IGPPGPR5RTGDAGPVGPPGPPGPPGPPGPPSAGFDFSFLPQPPQEKAHDGGRYYRA 



Figure 4 LC-MS/MS analysis of the rClal. Full length peptide sequences of 1057 amino acid are listed. Pichia rClal, Pichio-deuved rClal from 
strain FE291; Maize rClal, maize-derived rClal from line CGB; Human Clal, gel-isolated Clal fragment from commercial collagen (CalBiochem 
Inc); Maize rClal-OH, maize-derived rClal from line CCD; Pichia rClal-OH, Pichia-deuved rClal from strain FE285. Yellow-highlighted letters: 
amino acid sequences identified by the Orbitrap; green-highlighted letters: Hyp residues identified by the Orbitrap; red boxes: peptide regions 
identified in all five samples by the Orbitrap. Black boxes: collagenous triplets Xaa-Pro-Gly with Pro changed to Hyp in Maize rClal-OH but not 
in Maize rClal; single underlines: triplets with Pro unchanged in both maize lines; double underlines: triplets with Pro changed to Hyp in both 
maize lines. 
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Table 1 Summary of HRMS analysis on five Cla1 (1057 AA^) samples from maize, Pichia and human 

Pichia Maize Human Maize Pichia 

Clal (FE291) Clal (CGB) Clal Clal-OH (CGD) Clal-OH (FE285) 

Peptides identified by HRMS (highlighted in yellow. Figure 4) 

Total #AA 680 752 620 818 907 

Percent HRMS coverage 64.33% 71.14% 58.66% 77.39% 85.81% 

Peptide regions identified in all five Clal (475 AA) by HRMS (red boxes. Figure 4) 

Total # HYP identified by HRMS, in green 2 28 71 86 83 

Percent HYP identified by HRMS 0.42% 5.89% 14.95% 18.11% 17.47% 

% HYP (by AA analysis) N/A 1 .23%^ 1 0.8%^ N/A 1 1 .54%^ 

^sequences presents in all five samples 
^from reference [1 1] 
^from reference [28] 



from non-transgenic maize seed for pepsin treatment. 
As can be seen in Figure 5B, both maize-derived rCIal 
(CGB) and rCIal-OH (CGD) were as stable as the 
human collagen control at all temperatures tested in the 
absence of pepsin. When digested with pepsin, the 
maize-derived non-hydroxylated rCIal (CGB) was 
degraded after the heat treatment at temperatures as 
low as 4°C. On the other hand, the hydroxylated rCIal- 
OH (CGD) could still be detected after temperature 
treatment as high as 35°C when using anti-foldon anti- 
body, and 38.6°C when using anti-25 kD antibody. The 



difference in Tm results was likely due to the sensitivity 
and epitope recognition sites of two types of antibodies. 
Interestingly, the control native human collagen could 
only withstand the digestion upto temperature treatment 
around 31°C. This observation is in fact in agreement 
with the HRMS analysis of the collagens described in 
Table 1 and Figure 4. Because the maize-derived rCIal- 
OH has higher Hyp percentage (18.11%) than that from 
human collagen control (14.95%), it is expected that the 
increased Hyp residues could help to increase the ther- 
mal stability of the collagen molecules. 



CGB FE291 M CGD FE285 KDa 

-150 
-100 




B. 



Pepsin + 



°C 4 29 31 33 35 37 38.2 38.6 4 29 31 33 35 37 38.2 38.6 Antibody 



CGB 



CGD 



HuC 







Anti-foldon 


^^^^^^^^^ ^^^^^^^^^^y 

.v-^ Rl mS hM ^H.^H 




Anti-25kD 






Anti-foldon 






Anti-25kD 










Anti-25kD 


- Pepsin 


+ Pepsin 





Figure 5 Thermal stability analysis of the rClal from maize, Pichia and human. (A) Western blot results of the maize-derived rClotl (CGB), 
rClal-OH (CGD), Pichia-deuyed rClal (FE291), and rClal-OH (FE285) after 4°C incubation and pepsin treatment, using anti-foldon antibody. (B) 
Western blot results of the maize-derived rClal (CGB), rClal-OH (CGD), and human Clal (HuC) after heat treatments under various temperatures 
and pepsin treatments as indicated, using both anti-foldon and anti-25 kD collagen antibody. M: molecular weight marker. 
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Table 2 Summary of plant derived recombinant human collagen I a 1 



Expr6Ssion 
system 


ColldQen 






rP4H 


Hydro xylation 
content (%) 


Reference 




Regulatory sequences 


Gene 


Yield 








Tobacco 
Leaf 


P35S (constitutive) + PR-protein SP 


proa 1(1) 
ANproal(l) 


30 mg/kg powdered 
plants 


N/A 


0.53 


10 


Tobacco 
Leaf 


P35S (constitutive) + PR-protein SP 


AN proa 1(1) 


N/A 


N/A 


N/A 


27 


Tobacco 
Leaf 


L3 + PR-protein SP 


AN proa 1(1) 


0.5-1 mg/kg leaf 
material 


PI 287 + Native SP + 

C. elegans 
P4Ha/Mouse P4Hp 


8.41 


16 


Tobacco 


rbcSl (constitutive) -i- vacuole or apoplast 
targeting SPs 


proal (1)/ 
proa2(l) 


200 mg/kg fresh 
leaves 


P35S (constitutive) -i- 

vacuole 
or apoplast targeting 
SPs + 
Human P4Ha/|3 


7.55 


17 


Barley PI 
cell 


Ubi (constitutive) + At chitinase SP + 
HDEL (ER retention) 


proal (1) 


2-9 Mg/L cell culture 


N/A 


N/A 


7 


Barley Seed 


GluBl (endosperm specific) + 
At chitinase SP + HDEL 
(ER retention) 


Clal 45 kD 


Below detectable 

level (Clal) 
45 mg/kg seed 
(45 kD) 


N/A 


N/A (Clal) 2.8 
(45 kD) 


13 


Maize Seed 


globulin-1 (embryo specific) + 
barley a-amylase SP 


44 kD 


20 mg/kg seed 


N/A 


2.01 


29 


Maize Seed 


globulin-1 (embryo specific) + 
barley a-amylase SP 


Clal 


3 mg/kg seed 


N/A 


1.23 


11 


Maize Seed 


globulin-1 (embryo specific) + 
barley a-amylase SP 


Clal 44 kD 


15.9 mg/kg germ 

(Clal) 
49.6 mg/kg germ 

(44 kD) 


N/A 


N/A 


12 


Maize Seed 


globulin-1 (embryo specific) + 
barley a-amylase SP 


Clal 


12 mg/kg seed 

(Clal) 
4 mg/kg seed 
(Clal -OH) 


Pubi (constitutive) + 
Barley 
a-amylase SP + 
Human P4Ha/p 


18.11 


This study 



proal (I): human type I procollagen a 1 chain 

ANproal(l): human type I procollagen a 1 chain lacking N-propeptlde 

ANCproal(l): human type I procollagen a 1 chain lacking N-propeptide and C-propeptlde 

Clal: sequence Information Is not clear 

44 kD: 44 kD fragment of Clal 

45 kD: 45 kD fragment of Clal 



Discussion 

The production of plant-derived recombinant collagens 
have been reported in tobacco leaves, barley cell culture 
and seeds, as well as maize seeds as summarized in 
Table 2. Previous tobacco-derived rCIal studies showed 
that different combinations of recombinant human col- 
lagens (i.e. rCIal, rCIa2, and N-propeptide free rCIal) 
were used to improve the production of homotrimeric 
or heterotrimeric recombinant human type I collagen 
[10,16,17,27]. In a recent paper. Stein et al [17] achieved 
a high expression level of 200 mg/kg fresh leaves by 
expressing the collagens under a Chrysanthemum rbcSl 
promoter and vacuolar-targeting signal sequence. Early 
work with tobacco-derived collagens had very low levels 
of Hyp (0.53%, [10]). With co-expression of C. elegans 
P4Ha/Mouse P4Hp [16] or the human rP4Ha/p [17], 
Hyp levels were increased to 8.41% and 7.55%, respec- 
tively. However, this enhanced Hyp level in tobacco is 



still lower than that of native human collagen Clal, 
which is reported as 10.8% by amino acid analysis [28] 
or around 15% by the HRMS analysis (this work). 

Both the full length and a smaller fragment (45 kD) of 
rCIal were produced in barley cell culture [7] and bar- 
ley seeds [13]. The barley-derived 45 kD collagen has 
2.8% of Hyp content when produced in seeds without 
co-expression of rP4H genes [13]. 

Previous work on fractionation, purification and char- 
acterization of maize-derived full length and a smaller 
fragment (44 kD) of collagen suggested that an accumu- 
lation level of about 3 mg/kg (for the full length) and of 
20 mg/kg (for the 44 kD) of DSW, respectively [11,29]. 
A similar maize line accumulating the full length rCIal 
producing maize line (CGB) was used in this study. In 
our case, the collagen yield of the rCIal accumulating 
line without P4H co-expression averages 12 mg/kg 
DSW, while the rCIal accumulating line with P4H co- 
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expression (CGD) is about 4 mg/kg. The Hyp percen- 
tage in rCIal protein of CGB was reported as 1.23% 
using total amino acid composition (AAC) analysis [11]; 
however, it was measured at 5.89% by using HRMS ana- 
lysis in our study. Similarly, the Hyp percentage in 
human Clal was reported as 10.8% using AAC analysis 
[28]. In our HRMS analysis, the Hyp for human Clal 
measured around 15%. It is not clear why Hyp percen- 
tages of Clal proteins measured uniformly higher in 
HRMS analysis than that of in AAC analysis. This dis- 
crepancy is likely due to the different degrees of resolu- 
tion of these two very different methodologies. Because 
the concentrations of rCIal and rCIal-OH obtained 
from maize seeds were too low to be measured by AAC 
analysis in our study, we were not able to obtain AAC 
analysis results for comparison. 

P4H is an enzyme that regioselectively modifies the 
Pro residues in collagenous triplets Xaa-Pro-Gly [30,31] 
in the ER as a posttranslational modification. Compared 
to the Pichia recombinant protein production system, 
maize can produce hydroxylated rCIal with a compar- 
able Hyp percentage (Table 1, 18.11% in maize CGD vs 
17.47% in Pichia FE285). Interestingly, rCIal produced 
in maize seems to have a higher base-level Hyp percen- 
tage when compared to rCIal isolated from Pichia with 
no rP4H co-expression (Table 1, 5.89% in maize CGB vs 
0.42% in Pichia FE291). Small numbers of proline at 
both Xaa and Yaa positions got hydroxylated in CGB 
maize line without the co-expression of P4H (data not 
shown). It is likely that the rCIal produced in maize is 
also a substrate for plant endogenous P4Hs with lower 
efficiency [30]. 

Conversely, the expression of human rP4H in maize 
may also catalyze hydroxylation of Pro residues in any 
plant endogenous proteins with collagenous domains. 
We checked amino acid sequences of three abundant 
seed storage proteins (19 kD and 22 kD a-zein, and 27 
kD y-zein) in maize and did not find any collagenous 
triplets (X. Xu, unpublished). Therefore we do not 
expect any Pro to Hyp modifications on these seed sto- 
rage protein in the rP4H expressing CGD line. In fact, 
the Hyp-only AAC analysis on both CGB and CGD 
seeds showed no differences in Hyp contents (X. Xu, 
unpublished). However, because both a and P subunits 
of rP4H were under the control of the constitutive ubi- 
quitin promoter, it is possible that any of the collage- 
nous triplet domains on proteins in plant cells can be 
modified by rP4H in such transgenic lines. It may be 
desirable in the future to restrict rP4H expression to 
seed tissue only using seed specific promoters. 

Using HRMS to analyze posttranslational modification 
has obvious advantages such as low protein quantity 
requirement, free of contaminating proteins in samples 
and reading accuracy. However, it does not give 100% 



coverage. In this study the peptide coverage ranged 
from 58.66% to 85.81%. 

Because posttranslational modification is a continuous 
process in the cells, the collagen isolated from the seeds 
represents a population of protein molecules, i.e., the 
proline hydroxylation may vary from one collagen mole- 
cule to another. In fact, we have performed multiple 
HRMS measurements on samples extracted from same 
batch of seeds. We found that while positions of Pro to 
Hyp modification may vary between measurements, the 
overall Hyp content remained constant between these 
samples. 

The thermal stability tests in this report showed that 
maize-derived rCIal-OH could still be detected after 
pepsin digestion followed by heat treatment as high as 
35°C (using anti-foldon antibody) and 38.6°C (using 
anti-25 kD antibody). Commercial human collagen con- 
trol undergoing the same treatment could only with- 
stand up to 31°C temperatures. Stein et al [17] reported 
that the melting temperatures for their tobacco-derived 
collagen heterotrimer and human skin collagen samples 
were around 39°C. High melting temperature of plant- 
derived collagen could potentially be useful for certain 
industrial application where higher melting temperature 
is desired, for example, biomaterials for tissue engineer- 
ing [32,33]. 

We recovered the maize-derived rCIal from the seed 
total soluble proteins using a previously described proto- 
col [11]. Because collagens are acid soluble proteins, the 
extraction buffer used had a pH of 1.7. Unlike Zhang et 
al, [11], we did not perform extensive purification for 
rCIal before gel electrophoresis and Western blot ana- 
lysis. When treating such acidic rCIal solutions under 
high temperature as we normally do before loading pro- 
tein gels, we were unable to detect them in Western 
blot, suggesting that the combination of acidic buffer 
and high temperature could be detrimental to collagen 
integrity. Therefore in this study, all maize-derived 
rCIal samples in acidic solutions were not boiled prior 
to Western blot analysis to avoid collagen degradation. 

It is interesting to note that both maize- and Pichia- 
derived non-hydroxylated rCIal were completely 
digested by pepsin at 10°C after the temperature treat- 
ment of samples at 4°C in our study (Figure 5A). This 
result is different from what was reported in barley [7] 
and maize [11], in which plant- and Pichia-deriyed 
rCIal were still detectable after the heat treatment of 
26-27°C. This could be attributed to the different pepsin 
treatment protocols used in the experiments. For exam- 
ple, the pepsin experiments reported by Zhang et al[ll] 
were conducted under pH 7, with a 15 minutes heat 
treatment followed by 150 (ig/mL pepsin digestion at 4° 
C for 16-18 hr. Ritala et al[7] conducted the heat treat- 
ment for 6 min before subjecting the samples to 150 
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(ig/mL pepsin digestion at 10°C for 30 min under an 
acidic condition. Our conditions were similar to Ritala 
et al except that we used 200 (ig/mL pepsin for 15 min 
under pH 1.7. Because pepsin functions best in acidic 
environment, our pepsin digestion under low pH is 
likely leading to the degradation of non-hydroxylated 
rCIal even at 4°C. Another explanation could be the 
quantity of the collagen substrate used in different 
experiments. We estimated that approximately 50-100 
ng/reaction of unpurified rCIal from CGB seeds were 
subjected to pepsin digestion in our study. However, 
Zhang et al used about 600 - 700 ng purified rCIal per 
reaction in their study [11]. The quantity of collagen for 
pepsin digestion in Ritala et al[7] was not specified. 

We have demonstrated for the first time that mamma- 
lian-like hydroxylation of human rCIal can be achieved 
in transgenic maize co-expressed with a human rP4H. 
The Hyp content in maize-derived hydroxylated rCIal 
is comparable to that of the native human version, lead- 
ing to a similar thermal stability of the product. The 
current expression levels of collagen reported here are 
too low for large scale production, as desired accumula- 
tion level of recombinant proteins for commercial pro- 
duction is estimated between 250 to 1000 mg/kg grain 
[34,35]. Further improvement of recombinant protein 
production in plants can be achieved by optimization of 
gene expression including using more effective regula- 
tory elements and protein targeting/retention sequences, 
as well as using conventional breeding program to select 
high expression lines over generations [34,36]. 

Conclusions 

In this study we have shown that properly hydroxylated 
recombinant human collagen I alpha 1 (rCIal) can be 
produced in maize seed. By co-expressing recombinant 
human prolyl 4-hydroxylases (rP4Hs), we have success- 
fully produced rCIal containing Hyp residue levels that 
are comparable to native human Clal. The increased 
Hyp content is associated with increased thermal 
stability in maize-derived rCIal. Application of high- 
resolution mass spectrometry (HRMS) allowed us to 
measure hydroxylated prolines at specific amino acid 
positions in different samples. Our findings indicate that 
maize seed can be used as a system to produce recombi- 
nant proteins requiring mammalian-like posttransla- 
tional modifications. 

Methods 

Vector construction 

Human collagen type I a 1 (Clal) coding sequence 
together with its original N- and C-telopeptides 
sequences (UniProtKB/Swiss-Prot: P02452) were opti- 
mized by Aptagen LLC (Jacobus, PA) for expression in 



maize. The optimized Clal sequence was fused with a 
29 amino acids bacteriophage foldon peptide sequence 
[23] at the C-terminus to produce a protein with 1086 
amino acids. Two constructs (Figure 1) were made to 
produce either recombinant Clal (rCIal) only (CGB), 
or both rCIal and recombinant human prolyl-4-hydro- 
xylase (rP4H, CGD). The rCIal gene was regulated by a 
maize embryo-specific promoter, globulin- 1 [18], with a 
3'-terminator from potato protease inhibitor II (pin II) 
gene. Genes encoding two subunits of rP4H (rP4Ha and 
rP4Hp) were regulated by a maize constitutive promoter 
(ubiquitin promoter) and the potato pin II gene termi- 
nator. All three gene coding sequences (rCIal, rP4Ha, 
and rP4Hp) in the two constructs were translationally 
fused with a barley alpha amylase signal sequence 
(BAASS, [19]) at the 5' end. The phosphinothricin acetyl 
transferase (bar) gene driven by the cauliflower mosaic 
virus (CaMV) 35S promoter was adopted in both con- 
structs to be a marker for the transgenic callus selection. 
It confers resistance to the herbicide glufosinate ammo- 
nium (bialaphos) [37-39]. 

Production of transgenic plants 

Constructs CGB and CGD were introduced into imma- 
ture embryos of Hi II maize genotype [40] via an Agro- 
bacterium-hdised transformation system [41]. Briefly, 
maize immature embryos were infected by Agrobacter- 
ium strain EHAlOl [42] containing the above described 
vectors and selected on 3 mg/L bialaphos. Regeneration 
of transgenic plants from the callus was as previously 
described [20]. Seedlings were transplanted into soil in 
the greenhouse and allowed to flower and produce seed 
through hand-pollinations. Seed increases for multiple 
events from CGB and CGD were conducted in green- 
house and nursery trials. T2 transgenic maize seeds were 
used for further analysis in this study. 

PGR analysis of transgenic plants 

Total genomic DNA was isolated by Cetyl Trimethyl 
Ammonium Bromide (CTAB) method [43] from maize 
leaf or seed. The presence of transgenes rCIal, rP4Ha 
and rP4Hp were detected by polymerase chain reaction 
(PCR). A typical PCR reaction consists 100 ng of genomic 
DNA, 0.8 mM of dNTPs, 2 mM of MgCl2, Taq DNA poly- 
merase buffer and 0.5 U Taq DNA polymerase (Bioline 
USA Inc, Taunton, MA) in a final volume of 25 (iL. PCR 
was performed at the following condition for 35 cycles: 30 
s denaturation at 94°C, 30 s annealing at 60°C, and 45 s 
extension at 70°C. Primers for amplifying rCIal are x7-05 
(5'-ACCAGATGGGCCGCTCTCACCTTT-3') and x7-06 
(5'-TTCCCTGGTGCCGTTGGAGCTA-3'); for rP4Ha 
are x7-17 (5'-ATCTCGGCGTCGCTGATGAT-3') and x7- 
18 (5'-GTGGTCCGAGCTGGAGAACC-3'); and for 
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rP4Hp are x7-13 (5'-ATGAAGAACACCTCCTCC 
CTCTG-3') and x7-14 (5'-TCACAGCTCGTCCTT- 
CACGG-3'). PGR products were analyzed in 1% agarose 
gel. The expected sizes of PGR products are 1308 bp 
(rCIal), 745 bp (rP4Ha) and 1531 bp (rP4Hp), respec- 
tively. Gel was stained by ethidium bromide (0.5 (ig/ml) 
for 20 min. The products size was determined by 1 kb 
DNA Ladder (cat # N3232S, New England Biolabs). 

Protein extraction 

Total soluble protein (TSP) from maize seeds was 
extracted using an acidic buffer described by [11] for 
collagen preparation. Maize seeds were ground in a cof- 
fee grinder (Mr. Coffee) for 1 min. For rCIal extraction, 
extraction buffer (0.1 M phosphoric acid, 0.15 M 
sodium chloride, pH 1.7) was added in to the seed pow- 
der at the ratio of 1:10 (w/v). For rP4H extraction, 
extraction buffer [25 mM sodium phosphate (pH 6.6), 
100 mM sodium chloride, 0.1% Triton X-100 (v/v), 1 
mM EDTA, 10 (ig/mL leupeptin, and 0.1 mM serine 
protease inhibitor Perfabloc SC (Fluka)] was added into 
the seed powder at the ratio of 1:10 (w/v). The mixture 
was incubated in a shaker incubator (250 rpm, 37°C) for 
0.5 hour for rCIal and one hour for rP4H. The mix- 
tures were then centrifuged at 13,000 rpm for 10 min at 
room temperature in a bench top centrifuge. The super- 
natants were transferred to clean tubes for further ana- 
lysis. Some protein samples were concentrated by 
Amicon Ultra- 15 Centrifugal Filter Unit with Ultracel- 
30 membrane (cat # UFC903008, Millipore) followed 
the product instruction. In short, 15 mL of total seed 
protein extraction was loaded into the filter device, cen- 
trifuged at 3000 X g for approximately 2-3 hours at 4°C. 
Concentrated samples were recovered by withdrawing 
with a pipettor. The concentration level was measured 
by the volume and could be adjusted by the control of 
the centrifugation time. 

ELISA 

A competitive ELISA procedure developed by FibroGen 
and described by Zhang et al. [29] was used with minor 
modifications. Briefly, ELISA plates (cat # 3590, Corn- 
ing) were coated overnight at 4°C with 5 ng per well of 
heat-denatured (65 ± 5°C for 30 minutes) non-hydroxy- 
lated rCIal from Pichia pastoris (FE301,[7]) with phos- 
phate buffer saline (PBS, cat # 21-040-CV, Mediatech). 
After washing with washing buffer (10 mM PBS, 0.05% 
Tween 20, pH 7.0), the plates were blocked with 2% dry 
milk in 100 mM PBS for 1 hour at room temperature. 
After 3 x washings with washing buffer, heat-denatured 
samples and standard (FE301) in assay buffer (100 mM 
PBS, 0.05% Tween 20, 1% dry milk, pH 7.0) were added 
to the plates. The primary antibody, rabbit polyclonal 
anti-25 kDa Clal (CA725, FibroGen), was added 



immediately at a 1:4000 dilution in the assay buffer. 
After 1 hour incubation at room temperature, plates 
were washed 3 x with washing buffer. The goat-anti- 
rabbit IgG (H+L) HRP conjugate (cat # 81-6120, 
Zymed) was added at a 1:5000 dilution in the assay buf- 
fer followed by incubation at room temperature for 1 
hour. After 3 x washings with washing buffer, 100 (iL/ 
well of Sure Blue TMB substrate solution (cat # 52-00- 
01, Kirkegarrd & Perry Laboratories) were added. The 
plated were then read at 620 nm on a microplate reader 
(KC4, Biotek) after incubation at room temperature for 
30 minutes. 

Western blotting 

Forty microliters of protein extract from maize seed 
were mixed with 8 (iL of Laemmli sample buffer (cat # 
161-0737, Bio-Rad) and then loaded onto a 4-15% poly- 
acrylamide SDS-PAGE gel (cat # 161-1158, Bio-Rad). To 
avoid protein degradation in the combination of acidic 
pH and high temperature (X. Xu, unpublished), the step 
of sample boiling prior to loading was omitted. The pro- 
teins separated on the gel were transferred to a 0.45 (im 
nitrocellulose membrane using Bio-Rad Semidry Trans- 
blotting apparatus according to the manufacturer's 
instructions. Membranes were incubated in blocking 
buffer (138 mM sodium chloride, 2.7 mM potassium 
chloride, pH 7.4, 0.1% Tween-20, 5% dry milk powder) 
for 1 hour at room temperature on a rotary shaker. The 
membrane was then incubated for 1 hour in blocking 
buffer with 1:1000 dilution of anti-foldon antibody (rab- 
bit anti-sera with 0.01% sodium azide) for the rCIal, 
and with 1:1000 dilution of anti-P4Hp antibody (cat # 
63-164, ICN Biomedicals) for the rP4Hp. After washing 
with washing buffer (138 mM sodium chloride, 2.7 mM 
potassium chloride, pH 7.4, 0.05% Tween-20) 4 times (5 
min each wash), the membrane was then incubated for 
1 hour in blocking buffer with 1:5000 dilution of HRP- 
Goat anti-rabbit IgG (H+L) secondary antibody (cat # 
62-6120, Zymed) for the rCIal, and with 1:5000 dilution 
of HRP-Goat anti-mouse IgG (H+L) secondary antibody 
(cat # 62-6520, Zymed) for the P4Hp. After washing the 
membrane with washing buffer 4x5 min, the excess 
buffer was then drained off and the membrane trans- 
ferred into a clean container. Bands appeared after incu- 
bation with horseradish peroxidase substrate, 3,3', 5,5'- 
tetramethylbenzidine (cat # T0565, Sigma) within 10 
minutes. 

High-resolution mass spectrometry (HRMS) analysis 

To prepare maize-derived rCIal, 10 (ig of total soluble 
proteins extracted from seeds was separated on the 4- 
15% polyacrylamide SDS-PAGE gel followed by Bio-Safe 
Coomassie Stain (cat # 161-0786, Bio-Rad). For purified 
collagen control samples, three micrograms of each of 
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Pichia-deriyed non-hydroxylated collagen (FE291), 
Pichia-deriyed hydroxylated collagen (FE285), and 
human collagen (cat # 234138, CalBiochem Inc.) were 
loaded on the gel After electrophoresis, collagen bands 
were excised from the gels and sent to the Proteomics 
& Mass Spectrometry Facility at Donald Danforth Plant 
Science Center, St. Louis, MO for analysis. The samples 
were automatically digested with trypsin performed by 
MultiProbe II protein digester (PerkinElmer) in a tem- 
perature-controlled enclosed environment. After diges- 
tion, samples were run by LC-MS/MS on the Linear Ion 
Trap Orbitrap (LTQ-Velos Orbitrap, ThermoFisher 
Scientific). For post-translational modification analysis, 
the numbers of Hyp and Pro from each sample were 
counted and compared. 

Thermal stability analysis 

The melting temperature (Tm) of Clal samples was 
determined by pepsin digestion after heat treatment [23] . 
Twenty-five microliters of total soluble protein extracted 
from CGB and CGD maize seed was subjected to heat 
treatment in a Thermocycler machine (Biometra GmbH, 
Germany) at 4°C, or at temperatures ranged from 29°C to 
38.6°C for 6 min. For positive controls, 1.4 [ig of Pichia- 
derived rCIal in hydroxylated (FE285) and non-hydroxy- 
lated (FE291) forms, and human collagen were also trea- 
ted. After heat treatment, all protein samples were then 
incubated at 10°C with or without pepsin (0.2 mg/mL final 
concentration, cat # P6887, Sigma) for 15 min. Digestion 
results were analyzed by western blotting using anti foldon 
and anti 25 kDa collagen antibodies. 
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